Getting Set-up

Installing and Setting up RStudio

The R console looks like this:

File Organization

Make sure that you set up a folder for this class.

Using RMarkdown/knitr

You can knit the file. The first time you do this you will need to make sure you have the knitr package installed. You have the option to knit into .html, .pdf, and .doc. In general, in this course we will be knitting into .html.

RMarkdown formatting

To make something “code-looking” we use the grave accent ` found in the upper left of your keyboard.

To create a header, place a hash tag at the start of the line. For example, # Header 1 or create a level 2 header using ## Header Level 2.

To make text italics put asterisk around the text *like this*. To make text bold, put two asterisks around the text **like this**.

To make a list, just start creating your list using a - or * for each bullet, like this:

- list item 1
- list item 2

It is important that there is a blank line before the first bullet.

Add a link with the follwing code:

[Alt text that will display](www.google.com)

It will display like this:

Alt text that will display

Add an image with the following code:

![Alt text](https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/rmarkdown_wizards.png)

It will display like this:

Alt text

Alt text

The vast majority of markdown syntax are available in the RStudio RMarkdown Cheatsheet, Section 3.

R Chunks

Create an R chunk:

2+2
## [1] 4

OR

x<-4

echo=T or echo=F– determines whether or not to echo the source code in the output file. This can be useful if you are creating a document for someone to read that doesn’t need to see or doesn’t want to see you code, just the output. In general in this course for assignments I would like your code to be echoed. The default is echo=F.

results=T or results=F – determines whether or not the results will be displayed. This can be useful if you want to show code, but don’t care what the output is. The default is eval=T.

eval=T or eval=F – determines whether or not to evaluate the code. This can be useful if you have a whole chunk of code you don’t want run, but you also don’t want to. The default is eval=T.

There are many, many more options including fig.width, fig.height, cache, etc. The vast majority of options are available in the RStudio RMarkdown Cheatsheet, Section 5.

You have the option to set the options individually on each chunk and/or set the global options by using the code knitr::opts_chunk$set(your options here)) in the first code chunk.

Inline Code

Rather than using a code chunk (which is centered in the middle of the page), you also have to options to use inline code. You can place the following within any sentence or paragraph.

`r codehere`

For example,

This is the number `r x`.

becomes… This is the number 4.

Installing Packages

Packages can contain lots of things including: data sets, functions, etc.

You can install packages using the packages tab or you can use the code install.packages('packageyouwant') in the console.

In each new R session where you want to use the package you will have to load it by typing library('packageyouwant') in the console (or in the RMarkdown document - more later).

To get help with a package (or a function in a package) you can type ?packagename into the console.

Additional Reading (Optional)

Some Basic R code

Variables, Calculations, Vectors

Assigning Variables:

Calculations:

Vectors:

Referencing Elements of a Vector:

Adding to Vectors:

Importing Data

From a file on your computer:

From the web:

For now, we will mostly be working with .csv and .xls files. Later in the course, we may discuss other types of files.

Basics for Working with a Dataframe

Assessing Size:

Names:

Referencing Columns:

Calculations:

Conditional Subsetting:

Best Practices

Commenting

  • Be sure to comment your code (in R, use a # before a line of comment)
  • The more descriptive you can be the easier it will be for other to read (and for you to read later)

Naming

When naming variables, observations, data frames, or files, make them:

  • meaningful
  • consistent
  • concise
  • code and coder friendly

Other naming considerations:

  • avoid names that are common/used function names (ie. filter or mean)
  • consider making object names nouns, and function names verbs
  • it’s not the end of the world if you give something a bad name, but it will save you (and others) time and effort down the road
  • avoid formatting and symbols (ie. spaces or &)
  • keep a clear record of your variable names as well as longer descriptions including units (ie. surface_temp= surface temperature measurement on Mars in degrees Celsius)

Entering Things

Some suggestions for best practices:

  • be consistent (ie. purple vs. Purple vs. purple_)
  • put any additional information such as units or notes in a column separate from the value
  • if there is missing entries, enter the name thing for each missing value (it is common to use NA, NaN, -9999, -); don’t leave cells blank
  • if data is abbreviated, make a record somewhere of how the what they mean

Example

by @alisonhorst

Bad data entry, by @alisonhorst Good data entry, by @alisonhorst

The Basics of Working with Missing Data

Working with Factors